SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method

نویسنده

  • R. Sibson
چکیده

Main point Sibson gives an O(n 2) algorithm for single-linkage clustering, and proves that this algorithm achieves the theoretically optimal lower time bound for obtaining a single-linkage dendrogram. This improves upon the naive O(n 3) implementation of single linkage clustering. A single linkage dendrogram is a tree, where each level of the tree corresponds to a different threshold dissimilarity measure h. The nodes of a dataset are grouped into " equivalence classes " c(h) at each level of the dendrogram, where two classes C i and C j are merged if there is a pair of " OTU's " (vertices) v i ∈ C i and v j ∈ C j such that the dissimilarity measure between v i and v j is less than h, or D(v i , v j) < h. For example, consider a set of 10 vertices v 1 ,. .. , v 10 for which the dissimilarity matrix D is given below, with D ij equal to the dissimilarity between v i and v j. Suppose we take four cutoff dissimilarity measures h 1 , h 2 , h 3 , h 4 and produce the dendrogram according to these thresholds. An example illustrating how the 10 vertices are grouped into equivalence classes at each level is shown in Figure 1. Since no dissimilarity is at or below 1, each vertex or " OTU " is its own equivalence class at the level corresponding to h 1 = 1. At the next level, however, we see that some classes have been merged together because several dissimilarity measures are below h 2 = 2. We can see that c(h 2) consists of 6 equivalence classes, c(h 3) has 3 equivalence classes, and c(h 4 = 4) aggregates all the vertices into one equivalence class. In single linkage clustering, the number of levels in the tree is determined by the nearest-neighbor criterion – at each level, at least one new merge is made between two clusters, and the merge is made for clusters C i and C j if the minimal distance between vertices v i ∈ C i and v j ∈ C j is the smallest such distance across all the clusters. In other words, the nearest neighbors between clusters C j and C i are found, and if these neighbors are closer than all the other nearest-neighbor pairs, then C i and C …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MABAC - Matrix Based Clustering Algorithm

Clustering is a prominent method in the data mining field. It is a discovery process that groups data such that intra cluster similarity is maximized and the inter cluster similarity is minimized. Clustering has been widely used in a variety of areas and many clustering algorithms have been developed in response. Almost every report emphasizes differences and ignores similarities among algorith...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

An Efficient Cluster Head Selection Algorithm for Wireless Sensor Networks Using Fuzzy Inference Systems

An efficient cluster head selection algorithm in wireless sensor networks is proposed in this paper. The implementation of the proposed algorithm can improve energy which allows the structured representation of a network topology. According to the residual energy, number of the neighbors, and the centrality of each node, the algorithm uses Fuzzy Inference Systems to select cluster head. The alg...

متن کامل

A Novel Controller Based on Single-Phase Instantaneous p-q Power Theory for a Cascaded PWM Transformer-less STATCOM for Voltage Regulation

In this paper, dynamic performance of a transformerless cascaded PWM static synchronous shunt compensator (STATCOM) based on a novel control scheme is investigated for bus voltage regulation in a 6.6kV distribution system. The transformerless STATCOM consists of a thirteen-level cascaded H-bridge inverter, in which each voltage source H-bridge inverter should be equipped with a floating and iso...

متن کامل

An L1-norm method for generating all of efficient solutions of multi-objective integer linear programming problem

This paper extends the proposed method by Jahanshahloo et al. (2004) (a method for generating all the efficient solutions of a 0–1 multi-objective linear programming problem, Asia-Pacific Journal of Operational Research). This paper considers the recession direction for a multi-objective integer linear programming (MOILP) problem and presents necessary and sufficient conditions to have unbounde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 1973